The number of international benchmarking competitions is steadily increasing in various fields of machine learning (ML) research and practice. So far, however, little is known about the common practice as well as bottlenecks faced by the community in tackling the research questions posed. To shed light on the status quo of algorithm development in the specific field of biomedical imaging analysis, we designed an international survey that was issued to all participants of challenges conducted in conjunction with the IEEE ISBI 2021 and MICCAI 2021 conferences (80 competitions in total). The survey covered participants' expertise and working environments, their chosen strategies, as well as algorithm characteristics. A median of 72% challenge participants took part in the survey. According to our results, knowledge exchange was the primary incentive (70%) for participation, while the reception of prize money played only a minor role (16%). While a median of 80 working hours was spent on method development, a large portion of participants stated that they did not have enough time for method development (32%). 25% perceived the infrastructure to be a bottleneck. Overall, 94% of all solutions were deep learning-based. Of these, 84% were based on standard architectures. 43% of the respondents reported that the data samples (e.g., images) were too large to be processed at once. This was most commonly addressed by patch-based training (69%), downsampling (37%), and solving 3D analysis tasks as a series of 2D tasks. K-fold cross-validation on the training set was performed by only 37% of the participants and only 50% of the participants performed ensembling based on multiple identical models (61%) or heterogeneous models (39%). 48% of the respondents applied postprocessing steps.
translated by 谷歌翻译
Geometric rectification of images of distorted documents finds wide applications in document digitization and Optical Character Recognition (OCR). Although smoothly curved deformations have been widely investigated by many works, the most challenging distortions, e.g. complex creases and large foldings, have not been studied in particular. The performance of existing approaches, when applied to largely creased or folded documents, is far from satisfying, leaving substantial room for improvement. To tackle this task, knowledge about document rectification should be incorporated into the computation, among which the developability of 3D document models and particular textural features in the images, such as straight lines, are the most essential ones. For this purpose, we propose a general framework of document image rectification in which a computational isometric mapping model is utilized for expressing a 3D document model and its flattening in the plane. Based on this framework, both model developability and textural features are considered in the computation. The experiments and comparisons to the state-of-the-art approaches demonstrated the effectiveness and outstanding performance of the proposed method. Our method is also flexible in that the rectification results can be enhanced by any other methods that extract high-quality feature lines in the images.
translated by 谷歌翻译
Shape can specify key object constraints, yet existing text-to-image diffusion models ignore this cue and synthesize objects that are incorrectly scaled, cut off, or replaced with background content. We propose a training-free method, Shape-Guided Diffusion, which uses a novel Inside-Outside Attention mechanism to constrain the cross-attention (and self-attention) maps such that prompt tokens (and pixels) referring to the inside of the shape cannot attend outside the shape, and vice versa. To demonstrate the efficacy of our method, we propose a new image editing task where the model must replace an object specified by its mask and a text prompt. We curate a new ShapePrompts benchmark based on MS-COCO and achieve SOTA results in shape faithfulness, text alignment, and realism according to both quantitative metrics and human preferences. Our data and code will be made available at https://shape-guided-diffusion.github.io.
translated by 谷歌翻译
具有对比性学习目标的预训练方法在对话了解任务中表现出了显着的成功。但是,当前的对比学习仅将自调查的对话样本视为正样本,并将所有其他对话样本视为负面样本,即使在语义上相关的对话框中,也会强制执行不同的表示。在本文中,我们提出了一个树木结构化的预培训对话模型Space-2,该模型从有限标记的对话框和大规模的无标记的对话框COLPORA通过半监督的对比度预培训来学习对话框表示。具体而言,我们首先定义一个通用的语义树结构(STS),以统一不同对话框数据集的注释模式,以便可以利用所有标记数据中存储的丰富结构信息。然后,我们提出了一个新颖的多视图分数功能,以增加共享类似STS的所有可能对话框的相关性,并且在监督的对比预训练期间仅推开其他完全不同的对话框。为了充分利用未标记的对话,还增加了基本的自我监督对比损失,以完善学习的表示。实验表明,我们的方法可以在DialogLue基准测试中实现新的最新结果,该基准由七个数据集和四个流行的对话框组成。为了获得可重复性,我们在https://github.com/alibabaresearch/damo-convai/tree/main/main/space-2上发布代码和数据。
translated by 谷歌翻译
本文回顾了AIM 2022上压缩图像和视频超级分辨率的挑战。这项挑战包括两条曲目。轨道1的目标是压缩图像的超分辨率,轨迹〜2靶向压缩视频的超分辨率。在轨道1中,我们使用流行的数据集DIV2K作为培训,验证和测试集。在轨道2中,我们提出了LDV 3.0数据集,其中包含365个视频,包括LDV 2.0数据集(335个视频)和30个其他视频。在这一挑战中,有12支球队和2支球队分别提交了赛道1和赛道2的最终结果。所提出的方法和解决方案衡量了压缩图像和视频上超分辨率的最先进。提出的LDV 3.0数据集可在https://github.com/renyang-home/ldv_dataset上找到。此挑战的首页是在https://github.com/renyang-home/aim22_compresssr。
translated by 谷歌翻译
乳腺癌是女性癌症死亡的主要原因之一。作为乳房筛查的主要输出,乳房超声(US)视频包含用于癌症诊断的独家动态信息。但是,视频分析的培训模型是不平凡的,因为它需要一个大量的数据集,而注释也很昂贵。此外,乳房病变的诊断面临着独特的挑战,例如类间相似性和阶层内变异。在本文中,我们提出了一种开创性的方法,该方法直接利用了计算机辅助乳腺癌诊断中的视频。它利用掩盖的视频建模作为预防性的,以减少对数据集大小和详细注释的依赖。此外,开发了相关性的对比损失,以促进良性和恶性病变之间内部和外部关系的识别。实验结果表明,我们提出的方法实现了有希望的分类性能,并且可以超越其他最先进的方法。
translated by 谷歌翻译
稀疏奖励学习通常在加强学习(RL)方面效率低下。 Hindsight Experience重播(她)已显示出一种有效的解决方案,可以处理低样本效率,这是由于目标重新标记而导致的稀疏奖励效率。但是,她仍然有一个隐含的虚拟阳性稀疏奖励问题,这是由于实现目标而引起的,尤其是对于机器人操纵任务而言。为了解决这个问题,我们提出了一种新型的无模型连续RL算法,称为Relay-HER(RHER)。提出的方法首先分解并重新布置原始的长马任务,以增量复杂性为新的子任务。随后,多任务网络旨在以复杂性的上升顺序学习子任务。为了解决虚拟阳性的稀疏奖励问题,我们提出了一种随机混合的探索策略(RME),在该策略中,在复杂性较低的人的指导下,较高复杂性的子任务的实现目标很快就会改变。实验结果表明,在五个典型的机器人操纵任务中,与香草盖相比,RHER样品效率的显着提高,包括Push,Pickandplace,抽屉,插入物和InstaclePush。提出的RHER方法还应用于从头开始的物理机器人上的接触式推送任务,成功率仅使用250集达到10/10。
translated by 谷歌翻译
我们提出了一种称为独角兽的统一方法,可以使用相同的模型参数同时使用单个网络解决四个跟踪问题(SOT,MOT,VOS,MOTS)。由于对象跟踪问题本身的定义零散,因此开发了大多数现有的跟踪器来解决任务的单个或一部分,并过分地对特定任务的特征进行了专业化。相比之下,Unicorn提供了一个统一的解决方案,在所有跟踪任务中采用相同的输入,骨干,嵌入和头部。我们第一次完成了跟踪网络体系结构和学习范式的巨大统一。Unicorn在8个跟踪数据集中的特定于任务特定的对应物(包括Lasot,TrackingNet,Mot17,BDD100K,Davis16-17,MOTS20和BDD100K MOT)在PAR上或更好的对应物。我们认为,独角兽将是朝着一般视觉模型迈出的坚实一步。代码可从https://github.com/masterbin-iiau/unicorn获得。
translated by 谷歌翻译
超声(US)广泛用于实时成像,无辐射和便携性的优势。在临床实践中,分析和诊断通常依赖于美国序列,而不是单个图像来获得动态的解剖信息。对于新手来说,这是一项挑战,因为使用患者的足够视频进行练习是临床上不可行的。在本文中,我们提出了一个新颖的框架,以综合高保真美国视频。具体而言,合成视频是通过基于给定驾驶视频的动作来动画源内容图像来生成的。我们的亮点是三倍。首先,利用自我监督学习的优势,我们提出的系统以弱监督的方式进行了培训,以进行关键点检测。然后,这些关键点为处理美国视频中的复杂动态动作提供了重要信息。其次,我们使用双重解码器将内容和纹理学习解除,以有效地减少模型学习难度。最后,我们采用了对抗性训练策略,并采用了GAN损失,以进一步改善生成的视频的清晰度,从而缩小了真实和合成视频之间的差距。我们在具有高动态运动的大型内部骨盆数据集上验证我们的方法。广泛的评估指标和用户研究证明了我们提出的方法的有效性。
translated by 谷歌翻译
回归学习是经典的,是医学图像分析的基础。它为许多关键应用程序提供了连续的映射,例如属性估计,对象检测,分割和非刚性注册。但是,先前的研究主要以案例标准(如均方误差)为优化目标。他们忽略了非常重要的人口相关标准,这正是许多任务中的最终评估指标。在这项工作中,我们建议通过有关直接优化细粒相关损失的新型研究来重新审视经典回归任务。我们主要探索两个互补相关索引作为可学习的损失:Pearson线性相关(PLC)和Spearman等级相关性(SRC)。本文的贡献是两个折叠。首先,对于全球层面的PLC,我们提出了一项策略,以使其对异常值进行强大的态度并规范关键分布因素。这些努力显着稳定学习并扩大了PLC的功效。其次,对于本地级别的SRC,我们提出了一种粗到精细的方案,以减轻样品之间确切排名顺序的学习。具体而言,我们将样本排名的学习转换为样本之间相似关系的学习。我们在两个典型的超声图像回归任务上广泛验证了我们的方法,包括图像质量评估和生物措施测量。实验证明,通过直接优化相关性的细粒度指导,回归性能得到显着提高。我们提出的相关性损失是一般的,可以扩展到更重要的应用程序。
translated by 谷歌翻译